Goodness-of-Fit Measures for Induction Trees

نویسندگان

  • Gilbert Ritschard
  • Djamel A. Zighed
چکیده

This paper is concerned with the goodness-of-fit of induced decision trees. Namely, we explore the possibility to measure the goodnessof-fit as it is classically done in statistical modeling. We show how Chisquare statistics and especially the Log-likelihood Ratio statistic that is abundantly used in the modeling of cross tables, can be adapted for induction trees. Not only is the Log-likelihood Ratio statistic suited for testing the goodness-of-fit. It allows also to test the significance of the fit between two nested trees. In addition, we derive from it pseudo R’s. We propose also adapted forms of the Akaike (AIC) and Bayesian (BIC) information criteria that prove useful in selecting the best compromise model between fit and complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flood Hydrograph Simulation with Uncertainty in Rainfall - Runoff Parameters

Flood hydrograph simulation is affected by uncertainty in Rainfall – Runoff )RR( parameters. Uncertainty of RR parameters in Gharasoo catchment, part of the great Karkheh river basin, is evaluated by Monte–Carlo (MC) approach. A conceptual-distributed model, called ModClark, was used for basin simulation, in which the basin’s hydrograph was determined using the superposition of runoff generated...

متن کامل

Flood Hydrograph Simulation with Uncertainty in Rainfall - Runoff Parameters

Flood hydrograph simulation is affected by uncertainty in Rainfall – Runoff )RR( parameters. Uncertainty of RR parameters in Gharasoo catchment, part of the great Karkheh river basin, is evaluated by Monte–Carlo (MC) approach. A conceptual-distributed model, called ModClark, was used for basin simulation, in which the basin’s hydrograph was determined using the superposition of runoff generated...

متن کامل

Statistical Preprocessing for Decision Tree Induction

Some apparently simple numeric data sets cause signiicant problems for existing decision tree induction algorithms, in that no method is able to nd a small, accurate tree, even though one exists. One source of this diiculty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are...

متن کامل

TreeSAAP: Selection on Amino Acid Properties using phylogenetic trees

The software program TreeSAAP measures the selective influences on 31 structural and biochemical amino acid properties during cladogenesis, and performs goodness-of-fit and categorical statistical tests.

متن کامل

The Comparison Between Goodness of Fit Tests for Copula

‎Copula functions as a model can show the relationship between variables‎. ‎Appropriate copula function for a specific application is a function that shows the dependency between data in a best way‎. ‎Goodness of fit tests theoretically are the best way in selection of copula function‎. ‎Different ways of goodness of fit for copula exist‎. ‎In this paper we will examine the goodness of fit test...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003